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Abstract 

Fabrication technology and structural engineering states-of-art have led to a growing use 
of slender structures, making them more susceptible to static and dynamic actions that 
may lead to some sort of damage. In this context, regular inspections and evaluations are 
necessary to detect and predict structural damage and establish maintenance actions 
able to guarantee structural safety and durability with minimal cost. However, these 
procedures are traditionally quite time- consuming and costly, and techniques allowing 
a more effective damage detection are necessary. This paper assesses the potential 
of Artificial Neural Network (ANN) models in the prediction of damage localization in 
structural members, as function of their dynamic properties - the three first natural 
frequencies are used. Based on 64 numerical examples from damaged (mostly) and 
undamaged steel channel beams, an ANN-based analytical model is proposed as a 
highly accurate and efficient damage localization estimator.The proposed model yielded 
maximum errors of 0.2 and 0.7 % concerning 64 numerical and 3 experimental data 
points, respectively. Due to the high- quality of results, authors'next step is the application 
of similar approaches to entire structures, based on much larger datasets. 

Keywords: Structural Health Monitoring; Damage Localization; Steel Beams; Dynamic 
Properties; Natural Frequencies; Artificial Neural Networks. 

Resumen 

Los avances de la tecnologia de fabricacion y de la ingenieria estructural han conducido 
a la utilizacion crescente de estructuras esbeltas, y consecuentemente mas vulnerables 
a acciones estaticas y dinamicas que puedan generar algun tipo de dano. En este 
contexto, inspecciones regulates y evaluaciones son necesarias para detectar y predecir 
dahoen las estructuras,y estabelecer acciones de mantenimientoque puedan garantizar 
la seguridad y durabilidad estructurales bajo un costo optimizado. Sin embargo, estos 
procedimientos son tipicamente muy morosos y costosos, y tecnicas que permitan 
una deteccion del dano de forma mas efectiva son necesarias. Este articulo evalua el 
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potencial de las redes neuronales artificiales (ANN, en Ingles) en la prediccion de la 
localizacion del dano en elementos estructurales, como funcion de las caractaeristicas 
dinamicas de los mismos - las tres primeras frequencias naturales de vibracion son 
utilizadas. Basado en 64 ejemplos numericos de vigas en acero con seccion en 'canal', 
con (mayoritariamente) y sin dano, este trabajo propone un modelo analitico basado en 
ANN que es caracterizado por una alta precision y eficiencia. El modelo propuesto origino 
errores de 0.2 y 0.7% relativamente a 64 y 3 puntos experimentales, respectivamente. 
Debida a la elevada calidad de los resultados, el proximo paso de estes autores sera 
la aplicacion de abordagenes similares a estructuras completas de puentes o edificios, 
consecuentemente involucrando bases de datos mucho mas volumosas. 

Palabras Clave: Monitoreo de salud estructural; Localizacion de dano; Vigas en Acero; 
Propiedades Dinamicas; Frecuencias Naturales; Redes Neuronales Artificiales. 


INTRODUCTION 

Fabrication technology and structural engineering states-of-art have led to a growing 
use of slender structures in construction industry. Those structures (or structural 
members) are more susceptible to static and dynamic actions that may lead to damage 
and/or excessive vibration. In this context, regular inspections and evaluations are 
necessary to detect and predict structural damage and establish maintenance actions 
able to guarantee structural safety and durability with minimal cost. Flowever, these 
procedures are traditionally quite time-consuming and costly.Thus, techniques allowing 
a more efficient and less resource-dependent damage detection are in high demand 
and will contribute to a more sustainable built environment. 

In recent years, several authors (e.g., [1-3]) have concluded that structural damage 
detection is a problem of pattern recognition, in which a classification is made as 
function of physical properties of a system. Within machine learning, several types of 
Artificial Neural Networks (ANN) (e.g. feedforward nets, self-organizing maps, learning 
vector quantization) can become a quite effective damage detection tool when 
used in conjunction with the dynamic properties of a system (e.g., [4-5]) - note that 
nowadays is quite straight forward the accurate estimation of important dynamic 
properties (e.g., natural frequencies) of (possibly damaged) built structural systems 
(by means of accelerometers and/or other simple decices, and existing software - e.g., 
ARTeMIS Modal 4.0 [6]). According to Bandara et al. [7] and Ahmed [8], a clear challenge 
concerning ANNs is the fact that they typically need structural data of both damaged 
and intact structures to be able to classify satisfactorily. If the structure is not considered 
damaged in its current state, the information regarding the damaged state will be 
unavailable unless detailed structural models are used to generate this information, such 
as numerical ones based on the Finite Element Method (FEM). 

Several authors have published the application of machine learning for damage 
characterization in structural members (e.g., [9-12]). Nonetheless, none of those studies 
employed exactly the same structure and input/output variables considered in this work. 
Moreover, the accuracy provided by those solutions are typically insufficient (maximum 
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error for all data points > 5%) for what the authors of this paper consider to be acceptable 
(safe) in structural engineering practice. Thus, this paper primarily aims to assess the 
SSe potential of ANN-based models in the prediction of damage localization in structural 
members, as function of their dynamic properties - the three first natural frequencies 
are used in this work. Based on numerical data from damaged (mostly) and undamaged 
steel channel beams, an ANN-based analytical model is proposed and tested for both 
numerical and experimental data. Once proved that the approach taken works well for 
structural members, authors' next step (in the very nearfuture) isto apply similar procedures 
to entire bridge or building structures. 

DATA GATHERING 

Inspired by the experimental research of Brasiliano [13], who assessed the effect of 
structural damage on natural (free vibration) frequency values, the data used for the 
present investigation concerns damaged (mostly) and undamaged ASTM A36 steel 
channel beams (U101.6x4.67 [14]) with a length of 2.155 m and free-free boundary 
conditions. Sixty-four distinct beams (also called examples or data points in this 
manuscript) were simulated in ANSYS FEA software [15] to obtain a 3-input and 1 -output 
dataset for ANN design. The three first natural frequencies (Hz) of the beam are the 
input (independent) variables - see Tab. 1, whereas the damage location is the output 
(dependent) variable. The latter is given by the longitudinal distance (m) from beam's 
edge to the mid-point of local cross-section reduction that defines the damage 
(see Fig. 1(a)). For the 13 undamaged beams, the damage location adopted is non¬ 
null, randomly taken below 0.005 m, an approach typically providing better ANN-based 
approximations, according to authors'experience. 

FIGURE 1. Damaged beam tested by Brasiliano (2005): (a) experimental layout and damage location details, and 
(b) undamaged and damaged cross-sections. 


Damage Localization 
(ANN ouput variable) 
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TABLE 1 . Three first natural frequencies (ANN input variables): numerical vs. test (Brasiliano 2005) results. 



Timoshenko beam FEs of type BEAMl 88 [15], characterized by six degrees of freedom 
per node, were employed in all numerical models. For validation purposes, the first two 
models were used to predict the three first natural frequencies of two beams tested by 
Brasiliano [13] (also reported in [16]).These beams are characterized by the material and 
geometrical properties mentioned before, being one undamaged/intact and the other 
not. The latter was divided into 33 equal longitudinal elements and a 10 mm reduction 
of its cross-section (shortening of both flanges) was performed in elements 18 and 19, as 
illustrated in Fig. 1 .Tab. 1 presents the validation results in terms of natural frequencies, 
as well as the corresponding numerical modal shapes. The maximum error of 3.9 % 
indicates the suitability of the FE model for the present study. Once validated the 
numerical model, 50 other damage scenarios were simulated, varying damage extent 
and/or location. The last 12 models were made without damage but under different 
temperatures from -5 to 40 degrees Celsius. Considering a room temperature of 22 °C, 
distinct Young moduli were adopted as proposed by Callister and Rethwish [17]. The 
dataset used in ANN design can be found online in [18]. Next section provides all details 
concerning the ANN formulation, analyses and results. 

ARTIFICIAL NEURAL NETWORKS 
Introduction 

Machine learning, one of the six disciplines of Artificial Intelligence (Al) without which 
the task of having machines acting humanly could not be accomplished, allows us to 

"teach" computers how to perform tasks by providing examples of how they should be 
done [19]. When there is abundant data (also called examples or patterns) explaining a 
certain phenomenon, but its theory richness is poor, machine learning can be a perfect 
tool.The world is quietly being reshaped by machine learning, being the Artificial Neural 
Network (also referred in this manuscript as ANN or neural net) its (i) oldest [20] and (ii) 
most powerful [21] technique. ANNs also lead the number of practical applications. 
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virtually covering any field of knowledge [22-23]. In its most general form, an ANN is a 
mathematical model designed to perform a particular task, based in the way the human 
SSe brain processes information, i.e. with the help of its processing units (the neurons). ANNs 
mgemenas employed to perform several types of real-world basic tasks. Concerning 

functional approximation, ANN-based solutions are frequently more accurate than 
those provided by traditional approaches, such as multi-variate nonlinear regression, 
besides not requiring a good knowledge of the function shape being modelled [24]. 

The general ANN structure consists of several nodes disposed in /.vertical layers (input 
layer, hidden layers, and output layer) and connected between them, as depicted in Fig. 
2. Associated to each node in layers 2 to L, also called neuron, is a linear or nonlinear 
transfer (also called activation) function, which receives the so-called net input and 
transmits an output (see Fig. 5). All ANNs implemented in this workare called feedforward, 
since data presented in the input layer flows in the forward direction only, i.e. every node 
only connects to nodes belonging to layers located at the right-hand-side of its layer, as 
shown in Fig. 2. ANN's computing power makes them suitable to efficiently solve small 
to large-scale complex problems, which can be attributed to their massively parallel 
distributed structure and (ii) ability to learn and generalize, i.e, produce reasonably 
accurate outputs for inputs not used during the learning (also called training) phase. 

FIGURE 2. Example of a feedforward neural network. 


Input 



Learning 

Each connection between 2 nodes is associated to a synaptic weight (real value), which, 
together with each neuron's bias (also a real value), are the most common types of 
neural net unknown parameters that will be determined through learning. Learning is 
nothing else than determining network unknown parameters through some algorithm 
in order to minimize network's performance measure,typically a function of the difference 
between predicted and target (desired) outputs. When ANN learning has an iterative 
nature, it consists of three phases: (i) training, (ii) validation, and (iii) testing. From previous 
knowledge, examples or data points are selected to train the neural net, grouped in the 
so-called training dataset.Those examples are said to be"labelled"or"unlabeled", whether 
they consist of inputs paired with their targets, or just of the inputs themselves - learning 
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is called supervised (e.g., functional approximation, classification) or unsupervised (e.g., 
clustering), whether data used is labelled or unlabeled, respectively. During an iterative 
learning, while the training dataset is used to tune network unknowns, a process of 
cross-validation takes place by using a set of data completely distinct from the training 
counterpart (the validation dataset), so that the generalization performance of the 
network can be attested. Once "optimum" network parameters are determined, typically 
associated to a minimum of the validation performance curve (called early stop - see Fig. 
3), many authors still perform a final assessment of model's accuracy, by presenting to it 
a third fully distinct dataset called "testing". Heuristics suggests that early stopping avoids 
overfitting, i.e. the loss of ANN's generalization ability. One of the causes of overfitting 
might be learning too many input-target examples suffering from data noise, since the 
network might learn some of its features, which do not belong to the underlying function 
being modelled [25]. 


FIGURE 3. Cross-validation - assessing network's generalization ability. 



Implemented ANN features 

The"behavior"of any ANN depends on many "features", having been implemented 15 ANN 
features in this work (including data pre/post processing ones). For those features, it is 
important to bear in mind that no ANN guarantees good approximations via extrapolation 
(either in functional approximation or classification problems), i.e. the implemented ANNs 
should not be applied outside the input variable ranges used for network training. Since 
there are no objective rules dictating which method per feature guarantees the best 
network performance for a specific problem, an extensive parametric analysis (composed of 
nine parametric sub-analyses) was carried outtofind'theoptimum'net design. A description 
of all implemented methods, selected from state of art literature on ANNs (including 
both traditional and promising modern techniques), is presented next - Tabs. 2-4 show 
all features and methods per feature. The whole work was coded in MATLAB [26], making 
use of its neural network toolbox when dealing with popular learning algorithms (1-3 in 
Tab. 4). Each parametric sub-analysis (SA) consists of running all feasible combinations (also 
cal led "com bos") of pre-selected methods for each ANN feature, in order to get performance 
results for each designed net, thus allowing the selection of the best ANN according to a 
certain criterion.The best network in each parametric SA is the one exhibiting the smallest 
average relative error (called performance) for all learning data. 
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It is worth highlighting that, in this manuscript, whenever a vector is added to a matrix, it 
means the former is to be added to all columns of the latter (valid in MATLAB). 

avances 

endenciase 

ingenierias Qyg|jyg Variable Representation (feature 1) 

A qualitative variable taking n distinct "values" (usually called classes) can be represented in 
any of the following formats: one variable taking n equally spaced values in ]0,1], or 1-of-n 
encoding (boolean vectors - e.g., n=3: [1 0 0] represents class 1, [0 1 0] represents class 2, 
and [0 0 1] represents class 3). After transformation, qualitative variables are placed at the 
end of the corresponding (input or output) dataset, in the same original order. 

TABLE 2. Implemented ANN features (F) 1-5. 


FEATURE 

METHOD 

1 

FI 

F2 

F3 

F4 

F5 

Qualitative 
Var Represent 

Dimensional 

Analysis 

Input 

Dimensionality 

Reduction 

% 

Train-Valid- 
Test 

Input 

Normalization 

Boolean 

Vectors 

Yes 

Linear Correlation 

80-10-10 

Linear Max Abs 

2 

Eq Spaced in 
]0,1] 

No 

Auto-Encoder 

70-15-15 

Linear [0,1] 

3 

- 

- 

- 

60-20-20 

Linear [-1,1] 

4 

- 

- 

Ortho Rand Pro] 

50-25-25 

Nonlinear 

5 

- 

- 

Sparse Rand Pro] 

- 

Lin Mean Std 

6 

- 

- 

No 

- 

No 


Dimensional Analysis (feature 2) 

The most widely used form of dimensional analysis is the Buckingham's n-theorem, which 
was implemented in this work as described in [27]. 

Input Dimensionality Reduction (feature 3) 

When designing any ANN, it is crucial for its accuracy that the input variables are 
independent and relevant to the problem [28, 29]. There are two types of dimensionality 
reduction, namely (i) feature selection (a subset of the original set of input variables is used), 
and (ii) feature extraction (transformation of initial variables into a smaller set). In this work, 
dimensionality reduction is never performed whenthenumberof input variables is less than 
six.The implemented methods are described next. 

Linear Correlation 

In this feature selection method, all possible pairs of input variables are assessed with 
respect to their linear dependence, by means of the Pearson correlation coefficient 
where X and Y denote any two distinct input variables. For a set of n data points (x.,y.), 
/?^^is defined by 

cov(x,y) 

_-.2 ylVariX) VariY) 
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where (i) Var{X) and Cov{X, Y) are the variance ofXand covariance ofXand Y, respectively, 
and X and y are the mean values of each variable. In this work, cases where \R^y\ > 0.99 
indicate that one of the variables in the pair must be removed from the ANN modelling. 
The one to be removed is the one appearing less in the remaining pairs (X, Y) where 
\RJ ^ 0 .99. Once a variable is selected for removal, all pairs (X, Y) involving it must be 
disregarded in the subsequent steps for variable removal. 

Auto-Encoder 


TABLE 3. Implemented ANN features (F) 6-10. 


FEATURE 

METHOD 

F6 

F7 

F8 

F9 

F10 

Output 

Transfer 

Output Normalization 

Net 

Architecture 

Hidden 

Layers 

Connectivity 

1 

Logistic 

Lin[a,b] = 0.7[(p^.„,(pJ 

MLPN 

1 HE 

Adjacent Layers 

2 

- 

Lin[a,b] = 0.6[cp^.„,cpJ 

RBEN 

2 HE 

Adj Layers -h In-Out 

3 

Elyperbolic 

Tang 

Lin [a, b] = 0.5[cp^.^, (p^J 

- 

3 HE 

Eully-Connected 

4 

- 

Linear Mean Std 

- 

- 

- 

5 

Bilinear 

No 

- 

- 

- 

6 

Com pet 

- 

- 

- 

- 

7 

Identity 

- 

- 

- 

- 


This feature extraction technique uses itself a 3-layer feedforward ANN called auto¬ 
encoder (AE). After training, the hidden layer output for the presentation of each 
problem's input pattern (y^^) is a compressed vector (Q^x 1) that can be used to replace 
the original input layer by a (much) smaller one, thus reducing the size of the ANN 
model. In this work, Q=round{0/2) was adopted, being round 3 function that rounds 
the argument to the nearest integer. The implemented AE was trained using the 
'trainAutoencoder(...)'function from MATLAB's neural net toolbox. In order to select 
the best AE, 40 AEs were simulated, and their performance compared by means of the 
performance variable defined in sub-section 3.4. Each AE considered distinct (random) 
initialization parameters, half of the models used the'logsig'hidden transfer functions, 
and the other half used the'satlin'counterpart, being the identity function the common 
option for the output activation. In each AE, the maximum number of epochs - number 
of times the whole training dataset is presented to the network during learning, was 
defined (regardless the amount of data) by 


max epochs = 


J3000,Q >8 
|l500,Q <8 


( 2 ) 


Concerning the learning algorithm used for all AEs, no /.^weight regularization was 
employed, which wastheonlydefaultspecification notadoptedin'trainAutoencoder(...)'. 
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TABLE 4. Implemented ANN features (F) 11-15. 


FEATURE 

METHOD 

F11 

F12 

F13 

F14 

F15 

Hidden Transfer 

Parameter Initialization 

Learning 

Algorithm 

Performance 

Improvement 

Training 

Mode 

1 

Logistic 

Midpoint (W) Rands (b) 

BP 

NNC 

Batch 

2 

Identity-Logistic 

Rands 

BPA 

- 

Mini-Batch 

3 

FlyperbolicTang 

Randnc (W) -i- Rands (b) 

LM 

- 

Online 

4 

Bipolar 

Randnr (W) -i- Rands (b) 

ELM 

- 

- 

5 

Bilinear 

Randsmall 

mb ELM 

- 

- 

6 

Positive Sat Linear 

Rand [-47\] 

l-ELM 

- 

- 

7 

Sinusoid 

SVD 

CI-ELM 

- 

- 

8 

Thin-Plate Spline 

MBSVD 

- 

- 

- 

9 

Gaussian 

- 

- 

- 

- 

10 

Multiquadratic 

- 

- 

- 

- 

11 

Radbas 

- 

- 

- 

- 


Orthogonal and Sparse Random Projections 

This is another feature extraction technique aiming to reduce the dimension of input 
data (Q, x P) while retaining the Euclidean distance between data points in the new 
feature space.This is attained by projecting all data along the (i) orthogonal or (ii) sparse 
random matrix/\ (Q, x Q^< Q), as described by Kasun et al. [29]. 

Training, Validation and Testing Datasets (feature 4) 

Four distributions of data (methods) were implemented, namely p-p-p^^ = {80-10-10, 
70-15-15,60-20-20,50-25-25}, where re present the amount of training, validation 
and testing examples as % of all learning data (P), respectively. Aiming to divide learning 
data into training, validation and testing subsets according to a predefined distribution 
PfPpP„, the following algorithm was implemented (all variables are involved in these 
steps, including qualitative ones after converted to numeric - see 3.3.1): 

• For each variable q (row) in the complete input dataset, compute its minimum and 
maximum values. 

Select all patterns (if some) from the learning dataset where each variable takes 
either its minimum or maximum value. Those patterns must be included in the 
training dataset, regardless whatpps. Flowever, if the number of patterns "does not 
reach" p^, one should add the missing amount, providing those patterns are the 
ones having more variables taking extreme (minimum or maximum) values. 


132 







Abambres / Marcy / Doz (2019) 


In order to select the validation patterns, randomly select / (p^ + of those 
patterns not belonging to the previously defined training dataset. The remainder 
defines the testing dataset. It might happen that the actual distribution p-p^-p^ps 
not equal to the one imposed opriori (before step 1), which is due to the minimum 
required training patterns specified in step 2. 

Input Normalization (feature 5) 

The progress of training can be impaired iftraining data defines a region that is relatively 
narrow in some dimensions and elongated in others, which can be alleviated by 
normalizing each input variable across all data patterns. The implemented techniques 
are the following: 


LinearMaxAbs 

Lachtermacher and Fuller [30] proposed a simple normalization technique given by 

{ 7 } (/ •)= - Xiki:! - 

(3) 

where {Yl}n (/,:) and Yl (/,:) are the normalized and non-normalized values of the input 
variable for all learning patterns, respectively. Notation ":"in the column index, indicate the 
selection ofall columns (learning patterns). 

Linear[0,1]and[-h 1] 

A linear transformation for each input variable (/), mapping values in //(i,:) from [a*, 
5*]=[min(//(i,:)), max(//(i,:))] to a generic range [o, b], is obtained from 

Ranges [o, b]=[0, 1] anAd [o, 5]=[T, 1] were considered. 


Nonlinear 

Proposed by Pu and Mesbahi [31], although in the context of output normalization, the 
only nonlinear normalization method implemented for input data reads 





+c (0 


(5) 


where (i) Yiiij) is the non-normalized value of input variable / for pattern J, (ii) f is the 
number of digits in the integer part of Yl(i,J), (iii) sign (...) yields the sign of the argument, 
and (iv) C(i) is the average of two values concerning variable /, C/(i) and C2(i), where the 
former leads to a minimum normalized value of 0.2 for all patterns, and the latter leads 
to a maximum normalized value of 0.8 for all patterns. 


Linear Mean Std 

Tohidi and Sharifi [32] proposed the following technique 






( 6 ) 


where ^ur^^-.^and 9^^ fhe mean and standard deviation ofall non-normalized 

values (all patterns) stored by variable /. 
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Output Transfer Functions (feature 6) 


Logistic 


The most usual form of transfer functions is called Sigmoid. An example is the logistic 
function given by 


(p{s) = 


1 

l + e~' 


(7) 


Hyperboiic Tang 

The HyperbolicTangent function is also of sigmoid type, being defined as 

9(s) = —— 


Biiinear 


The implemented Bilinear function is defined as 



5 > 0 

5<0 


identity 

The Identity activation is often employed in output neurons, reading 

(pis) = s 


( 8 ) 


(9) 


( 10 ) 


Output Normalization (feature 7) 

Normalization can also be applied to the output variables so that, for instance, the 
amplitude of the solution surface at each variable is the same. Otherwise, training may 
tend to focus (at least in the earlier stages) on the solution surface with the greatest 
amplitude [33]. Normalization ranges not including the zero value might be a useful 
alternative since convergence issues may arise due to the presence of many small (close 
to zero) target values [34]. Four normalization methods were implemented. The first 
three follow eg. (4), where (i) [a, b] = 70% [^min, ^max], (ii) [a, b] = 60% [^min, ^max], and (iii) 
[a, b] = 50% [(^min, ^max], being [^min, ^max] the output transfer function range, and [a, b] 
determined to be centered within [(p^n, ^max] and to span the specified % (e.g., (b-a) = 
0.7 (^max — (pm\n )). Whenever the output transfer functions are unbounded (Bilinear and 
Identity), it was considered [a, b] = [0, 1] and [a, b] = [T, 1], respectively. The fourth 
normalization method implemented is the one described by eg. (6). 

Network Architecture (feature 8) 

Muiti-Layer Perceptron Network (MLPN) 

This is a feedforward ANN exhibiting at least one hidden layer. Fig. 2 depicts a 3-2T MLPN 
(3 input nodes, 2 hidden neurons and 1 output neuron), where units in each layer link only 
to some nodes located ahead. At this moment, it is appropriate to define the concept of 
partially- (PC) and fully-connected (FC) ANNs. In this work a FC feedforward network is 
characterized by having each node connected to every node in a different layer placed 
forward - any other type of network is said to be PC (e.g., the one in Fig. 2). According 
to Wilamowski [35], PC MLPNs are less powerful than MLPN where connections across 
layers are allowed, which usually lead to smaller networks (less neurons). 
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Fig. 4 represents a generic MLFN composed of L layers, where /(/= 1is a generic 
layer and "ql" a generic node, being q = Q/ its position in layer / (1 is reserved to 
the top node). Fig. 5 represents the model of a generic neuron (/ = 2,..., L), where (i) p 
represents the data pattern presented to the network, (ii) subscripts m = 1,..., Qn and n 
= are summation indexes representing all possible nodes connecting to neuron 

"ql" (recall Fig. 4), (iii) is neuron's bias, and (iv) represents the synaptic weight 
connecting units "mn"and "p/". Neuron's net input for the presentation of pattern piSqip) 
is defined as 

Q» /-I 

^qlp ~ ymnp ^mnql ^ql ’ ymnp ^mnql ~ ^^ymnp ^mnql ( 11 ) 

tn=l n=\ 

where ym/pis the value ofthe network input concerning example p.The output of a 
generic neuron can then be written as (/ = 2,..., L) 

ygip=<p,(S,ip) ( 12 ) 


where p/is the transfer function used for all neurons in layer /. 

FIGURE 4. Generic multi-layer feedforward network. 



FIGURE 5. Generic neuron placed anywhere in the MLPN of Fig. 4 (I = 2,..L) 
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Radial-Basis Function Network (RBFN) 

Although having similar topologies, RBFN and MLPN behave very differently due to distinct 
SSe hidden neuron models - unlike the MLPN, RBFN have hidden neurons behaving differently 
mgemenas Qutput neurons. Accofding to Xie et al. [36], RBFN (i) are specially recommended in 

functional approximation problems when the function surface exhibits regular peaks and 
valleys, and (ii) perform more robustly than MLPN when dealing with noisy input data. 
Although traditional RBFN have 3 layers, a generic multi-hidden layer (see Fig. 4) RBFN 
is allowed in this work, being the generic hidden neuron's model concerning node"/7/2" 
ill = 1 ,...,Qi 2, h = 2,..., L-1) presented in Fig. 6. In this model, (i) and (called RBF 

center) are vectors of the same size i^zi^i^ denotes de z component of vector , and it is a 

network unknown), being the former associated to the presentation of data pattern p, (ii) 
is called RBF width (a positive scalar) and also belongs, along with synaptic weights and 
RBF centers, to the set of network unknowns to be determined through learning, (iii) (pi^ 
is the user-defined radial basis (transfer) function (RBF), described in eqs. (20)-(23), and (iv) 
is neuron's output when pattern p is presented to the network. In ANNs not involving 
learning algorithms 1-3 in Tab. 4, vectors vi^i^p and are defined as (two versions of vi^i^p 
where implemented and the one yielding the best results was selected) 

%p = [ti(4-i)p ^1(4-1)/,4 - Tz(4-i)p ^2(4-1)44 - ] 




or 


- Tz(4-i)p - yQ,^_,{k-\)p 



and 

^44 - 


(13) 


whereas the RBFNs implemented through MATLAB neural net toolbox (involving 
learning algorithms 1 -3 in Tab. 4) are based on the following definitions 

=\y\{i,-\)p - yz{h-\)p - T0/2-i(/2-i)p] 

^44 ^ [%2-i)/i/2 - ^2(4-1)44 - ^0/2-1(4-1)44] 

Lastly, according to the implementation carried out for initialization purposes (described 
in 3.3.12), RBF center vectors per hidden layer (one per hidden neuron) are initialized as 
integrated in a matrix (termed RBF center matrix) having the same size of a weight matrix 
linking the previous layer to that specific hidden layer, and (ii) RBF widths (one per hidden 
neuron) are initialized as integrated in a vector (called RBF width vector) with the same size 
of a hypothetic bias vector. 

Hidden Nodes (feature 9) 

Inspired by several heuristics found in the literature for the determination of a suitable 
number of hidden neurons in a single hidden layer net [37-39], each value in hntest, 
defined in eq. (15), was tested in this work as the total number of hidden nodes in the 
model, i.e. the sum of nodes in all hidden layers (initially defined with the same number 
of neurons). The number yielding the smallest performance measure for all patterns 


136 


Abambres / Marcy / Doz (2019) 


(as defined in 3.4, with outputs and targets not postprocessed), is adopted as the best 
solution. The aforementioned hntestls defined by 


incr = \A, 4, 4, 10, 10, 10, 10] 
minimum — [1, 1, 1, 10, 10, 10, 10] 


wflJCj =min| round maxi 2Q^+Q^ 




ln(P) 


, 1500 


max^ = max(^min(round(0.1P), 1500), 300^ 
maximum = \max ^, max ^, max ^, max^, max ^, max ^, max^ 
hntest = mimmum(¥^^) ; incr(¥^^) : maximum(¥^^) 


(15) 


where (i) Q/ and Qtare the number of input and output nodes, respectively, (ii) Pand Pt 
are the number of learning and training patterns, respectively, and (iii) Fl 3 is the number 
of feature 13's method (see Tab. 4). 

Connectivity (feature 10) 

For this ANN feature, three methods were implemented, namely (i) adjacent layers - 
only connections between adjacent layers are made possible, (ii) adjacent layers + input- 
output - only connections between (iil) adjacent and (ii2) input and output layers are 
allowed, and (iii) fully- connected (all possible feedforward connections). 

FIGURE 6. Generic hidden neuron placed anywhere in the RBFN of Fig. 4 (/^ = 2,..M). 



Hidden Transfer Functions (feature 11) 

Besides functions (i) Logistic - eg. (7), (ii) Flyperbolic Tangent - eq. (8), and (iii) Bilinear - 
eq. (9), defined in 3.3.6, the ones defined next were also implemented as hidden transfer 
functions. During software validation it was observed that some hidden node outputs 
could be infinite or NaN (not-a-number in MATLAB - e.g., 0/0=lnf/lnf=NaN), due to 
numerical issues concerning some hidden transfer functions and/or their calculated input. 
In those cases, it was decided to convert infinite to unitary values and NaNs to zero (the 
only exception was the bipolar sigmoid function, where NaNs were converted to -1). Other 
implemented trick was to convert possible Gaussian function's NaN inputs to zero. 
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Identity-Logistic 


In Gunaratnam and Gero [40], issues associated with flat spots at the extremes of a 
sigmoid function were eliminated by adding a linear function to the latter, reading 


l + e ' 


(16) 


Bipolar 


The so-called bipolar sigmoid activation function mentioned in Lefik and Schrefler [41 ], 
ranging in [- 1 , 1 ], reads 


^30 = 


l-e-‘ 
l + e * 


(17) 


Positive Saturating Linear 

In MATLAB neural net toolbox, the so-called Positive Saturating Linear transfer function, 
ranging in [ 0 , 1 ], is defined as 

f 1, s>\ 

^( 5 ) = I 5, 0<5<1 (18) 

[0, 5 < 0 

Sinusoid 

Goncerning less popular transfer functions, reference is made in [42] to the sinusoid, 
which in this work was implemented as 


Radial Basis Functions (RBF) 

Although Gaussian activation often exhibits desirable properties as a RBF, several authors 
(e.g., [43]) have suggested several alternatives. Following nomenclature used in 3.3.8, (i) 
the Thin-Plate Spline function is defined by the next function is employed as Gaussian- 
type function when learning algorithms 4-7 are used (seeTab. 4) 


the Multiquadratic function is given by 




g-0.5. 



( 21 ) 


and (iv) the Gaussian-type function (called "radbas"in MATLAB toolbox) used by RBFNs 
trained with learning algorithms 1 -3 (see Tab. 4), is defined by 




( 5 ) = ^/ 7 , 


'"hhp 


( 22 ) 
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where || ... || denotes the Euclidean distance in all functions. 


<Pu{s) = e-^\ s = 


(23) 


Parameter Initialization (feature 12) 

The initialization of (i) weight matrices (Q^x Q^, being Qaand Qb node numbers in layers 
0 and b being connected, respectively), (ii) bias vectors (Q^ x 1), (iii) RBF center matrices (Q-l 
X Q^, being cthe hidden layer that matrix refers to), and (iv) RBF width vectors (Q^x 1), are 
independent and in most cases randomly generated. For each ANN design carried out in 
the context of each parametric analysis combo, and whenever the parameter initialization 
method is not the "Mini-Batch SVD" ten distinct simulations varying (due to their random 
nature) initialization values are carried out, in order to find the best solution.The implemented 
initialization methods are described next. 

Midpoint, Rands, Randnc, Randnr, Randsmaii 

These are all MATFAB built-in functions. Midpoint \s used to initialize weight and RBF center 
matrices only (not vectors). All columns of the initialized matrix are equal, being each entry 
equal to the midpoint of the (training) output range leaving the corresponding initial 
layer node - recall that in weight matrices, columns represent each node in the final layer 
being connected, whereas rows represent each node in the initial layer counterpart. Rands 
generates random numbers with uniform distribution in [-1, l]./?6/nc/nc(only used to initialize 
matrices) generates random numbers with uniform distribution in [-1, l],and normalizes each 
array column to 1 (unitary Euclidean norm)./?6/n(S^nr (only used to initialize matrices) generates 
random numbers with uniform distribution in [-1, 1], and normalizes each array row to 1 
(unitary Euclidean norm). Rondsmoll generates random numbers with uniform distribution 
in [-0.1,0.!]. 

Rand [-iim, iim] 

This function is based on the proposal in [44], and generates random numbers with 
uniform distribution in [-Iim, Iim], being Iim layer-dependent and defined by 



(24) 


where a and b refer to the initial and final layers integrating the matrix being initialized, 
and L is the total number of layers in the network. In the case of a bias or RBF width 
vector, Iim is always taken as 0.5. 

SVD 

Although Deng et al. [45] proposed this method for a 3-layer network, it was implemented 
in this work regardless the number of hidden layers. 

Mini-Batch SVD 

Based on [45], this scheme is an alternative version of the former SVD. Now, training 
data is split into min {Q^, P) chunks (or subsets) of equal size P^ = max {flooRPy Q^), 1} - 
floor rounds the argument to the previous integer (whenever it is decimal) or yields the 
argument itself, being each chunk aimed to derive Q^.= 1 hidden node. 
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S Learning Algorithm (feature 13) 

The most popular learning algorithm is called error back-propagation (BP), a first-order 
SSe gradient method. Second-order gradient methods are known to have higher training 
mgemenas accuracy [46].The most employed is called Levenberg-Marquardt (LM). All these 

traditional schemes were implemented using MATLAB toolbox [26]. 

Back-Propagation (BP, BP A), Levenberg-Marquardt (LM) 

Two types of BP schemes were implemented, one with constant learning rate (BP) - traingd' 
in MATLAB, and another with iteration-dependent rate, named BP with adaptive learning 
rate (BPA) - 'traingda' in MATLAB. The learning parameters set different than their default 
values are: 

• Learning Rate = 0.01 / cs°^ being cs the chunk size, as defined in 3.3.15. 

• Minimum performance gradient = 0. 

Concerning the LM scheme - Trainim' in MATLAB, the only learning parameter set 
different than its default value was the abovementioned (ii). 

Extreme Learning Machine (ELM, mb ELM, i-ELM, Ci-ELM) 

Besides these traditional learning schemes, iterative and time-consuming by nature, four 
versions of a recent, powerful and non-iterative learning algorithm, called Extreme Learning 
Machine (ELM), were implemented (unlike initially proposed by the authors of ELM, 
connections across layers were allowed in this work), namely: (batch) ELM [47], Mini-Batch 
ELM (mb ELM) [48], Incremental ELM (I-ELM) [49], Convex Incremental ELM (CI-ELM) [50]. 

Performance Improvement (feature 14) 

A simple and recursive approach aiming to improve ANN accuracy is called Neural 
Network Composite (NNC), as described in [51]. In this work, a maximum of 10 extra 
ANNs were added to the original one, until maximum error is not improved between 
successive NNC solutions. Later in this manuscript, a solution given by a single neural net 
might be denoted as ANN, whereas the other possible solution is called NNC. 

Training Mode (feature 15) 

Depending on the relative amount oftraining patterns, with respecttothe wholetraining 
dataset, that is presented to the network in each iteration of the learning process, several 
types oftraining modes can be used, namely (i) batch or (ii) mini-batch. Whereas in 
the batch mode all training patterns are presented (called an epoch) to the network in 
each iteration, in the mini-batch counterpart the training dataset is split into several data 
chunks (or subsets) and in each iteration a single and new chunk is presented to the 
network, until (eventually) all chunks have been presented. Learning involving iterative 
schemes (e.g., BP- or LM-based) might require many epochs until an "optimum"design is 
found.The particular case of having a mini-batch mode where all chunks are composed 
by a single (distinct) training pattern (number of data chunks = P^, chunk size = 1), is 
called online or sequential mode. Wilson and Martinez [52] suggested that if one wants 
to use mini-batch training with the same stability as online training, a rough estimate of 
the suitable learning rate to be used in learning algorithms such as the BP, is ^oniine/Vcs 


140 


Abambres / Marcy / Doz (2019) 


, where C5 is the chunk size and //online is the online learning rate - their proposal was 
adopted in this work. Based on the proposal of Liang et al. [48], the constant chunk size 
(cs) adopted for all chunks in mini-batch mode reads cs = rr\\n{nneon{hn) + 50, P), being 
hn a vector storing the number of hidden nodes in each hidden layer in the beginning 
of training, and meon(hn) the average of all values in hn. 

Network Performance Assessment 

Several types of results were computed to assess network outputs, namely (i) maximum 
error, (ii) % errors greater than 3%, and (iii) performance, which are defined next. All 
abovementioned errors are relative errors (expressed in %) based on the following 
definition, concerning a single 


e. =100 


d -y , 

(TO nl 


(25) 


where (i) is the desired (or target) output when pattern p within iteration / 
(p=l,..., P) is presented to the network, and (ii) is net's output for the same data 
pattern. Moreover, denominator in eg. (25) is replaced by 1 whenever < 0.05 - d^^\n 
the nominator keeps its real value.This exception to eg. (25) aims to reduce the apparent 
negative effect of large relative errors associated to target values close to zero. Even so, 
this trick may still lead to (relatively) large solution errors while very satisfactory results 
are depicted as regression plots (target vs. predicted outputs). 

Maximum Error 

This variable measures the maximum relative error, as defined by eg. (25), among all 
output variables and learning patterns. 

Percentage of Errors > 3% 

This variable measures the percentage of relative errors, as defined by eg. (25), among all 
output variables and learning patterns, that are greater than 3%. 

Performance 

In functional approximation problems, network performance is defined as the average 
relative error, as defined in eg. (25), among all output variables and data patterns being 
evaluated (e.g., training, all data). 

Software Validation 

Several benchmark datasets/functions were used to validate the developed software, 
involving low- to high-dimensional problems and small to large volumes of data. Due to 
paper length limit, validation results are not presented herein but they were made public 
online [53]. 
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Parametric Analysis Results 

Aiming to reduce the computing time by cutting in the number of combos to be run 
- note that all features combined lead to hundreds of millions of combos, the whole 
parametric simulation was divided into nine parametric SAs, where in each one feature 
7 only takes a single value. This measure aims to make the performance ranking of all 
combos within each "small" analysis more "reliable" since results used for comparison 
are based on target and output datasets as used in ANN training and yielded by the 
designed network, respectively (they are free of any postprocessing that eliminates 
output normalization effects on relative error values). Whereas (i) the and 2^^ SAs 
aimed to select the best methods from features 1,2, 5, 8 and 13 (all combined), while 
adopting a single popular method for each of the remaining features (F3: 6 , F4: 2, F 6 .- 
{1 or 7}, F7: 1, F9: 1, FlO: 1, Fll: {3, 9 or 11}, Fl2: 2, Fl4: 1, Fl5: 1 - see Tabs. 2-4) - SA 1 
involved learning algorithms 1-3 and SA 2 involved the ELM- based counterpart, (ii) the 
3 rd _ yth combined all possible methods from features 3,4 ,6 and 7, and concerning 
all other features, adopted the methods integrating the best combination from the 
aforementioned first SA, (iii) the 8 ^*^ SA combined all possible methods from features 
11,12 and 14, and concerning all other features, adopted the methods integrating the 
best combination (results compared after postprocessing) among the previous five sub¬ 
analyses, and lastly (iv) the 9^*^ SA combined all possible methods from features 9, 10 
and 15, and concerning all other features, adopted the methods integrating the best 
combination from the previous analysis. 


ANN feature methods used in the best combo from each of the abovementioned nine 
parametric sub-analyses, are specified in Tab. 5 (the numbers represent the method 
number as in Tabs 2-4).Tab. 6 shows the corresponding relevant results for those combos, 
namely (i) maximum error, (ii) % errors > 3%, (iii) performance (all described in section 
3, and evaluated for all learning data), (iv) total number of hidden nodes in the model, 
and (v) average computing time per example (including data pre- and post-processing). 
All results shown in Tab. 6 are based on target and output datasets computed in their 
original format, i.e. free of any transformations due to output normalization and/or 
dimensional analysis. The microprocessor used in this work has the following features: 
OS: WinlO Flome- 64bits, RAM: 8 GB, Local Disk Memory: 128GB, GPU: Intel® Gore™ i5 
6200U (dual-core) @ 2.30 GFIz. 
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TABLE 5. ANN features (F) methods used in the best combo from each parametric sub-analysis (SA). 


SA Q 

F2 

F3 

F4 

F5 

F6 

F7 

F8 

F9 

F10 

F11 

F12 

F13 

F14 

F15 

1 1 

2 

6 

2 

1 

1 

1 

1 

1 

1 

3 

2 

3 

1 

3 

2 1 

2 

6 

2 

3 

7 

1 

2 

1 

1 

9 

2 

7 

1 

3 

3 1 

2 

6 

3 

1 

1 

1 

1 

1 

1 

3 

2 

3 

1 

3 

4 1 

2 

6 

2 

1 

1 

2 

1 

1 

1 

3 

2 

3 

1 

3 

5 1 

2 

6 

1 

1 

1 

3 

1 

1 

1 

3 

2 

3 

1 

3 

6 1 

2 

6 

2 

1 

7 

4 

1 

1 

1 

3 

2 

3 

1 

3 

7 1 

2 

6 

4 

1 

7 

5 

1 

1 

1 

3 

2 

3 

1 

3 

8 1 

2 

6 

4 

1 

7 

5 

1 

1 

1 

3 

2 

3 

1 

3 

9 1 

2 

6 

4 

1 

7 

5 

1 

3 

3 

3 

2 

3 

1 

3 


Overall, to obtain satisfactory results, 219 ANN feature combinations were run in the 
parametric analysis of this problem. In 3.7, the best ANN-based model obtained is proposed 
to efficiently and effectively solve the real-world problem addressed. In sub-section 3.7.4, 
the performance results of the proposed ANN are also based on target and output datasets 
computed in their original format. 


TABLE 6. Performance results for the best design from each parametric sub-analysis: (a) ANN, (b) NNC. 



ANN 

SA 

Max Error 
(%) 

Performance 
All Data 

(%) 

Errors > 3% 
(%) 

Total Hidden 
Nodes 

Running Time/ 
Data Point 
(s) 

1 

46.2 

2.7 

23.4 

12 

4.40 E -03 

2 

1598.9 

99.2 

90.6 

43 

2.58 E-04 

3 

45.5 

2.5 

21.9 

12 

1.77 E -03 

4 

141.2 

8.7 

34.4 

12 

3.23 E -04 

5 

10.1 

1.6 

17.2 

12 

1.74 E -03 

6 

253.4 

8.9 

31.3 

12 

3.04 E -03 

7 

12.4 

1.2 

9.4 

12 

7.53 E -04 

8 

108.8 

8.4 

31.3 

12 

1.44 E -03 

9 

0.2 

0.0 

0.0 

12 

1.34 E-03 


(a) 
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NNC 



SA 

Max Error 

(%) 

Performance 

All Data 

(%) 

Errors > 3% 

(%) 

Total Hidden 

Nodes 

Running Time/ 

Data Point 

(s) 

1 

14.1 

1.7 

20.3 

12 

4.56 E-03 

2 

- 

- 

- 

- 

- 

3 

- 

- 

- 

- 

- 

4 

- 

- 

- 

- 

- 

5 

- 

- 

- 

- 

- 

6 

253.3 

8.4 

28.1 

12 

5.22 E-03 

7 

9.4 

0.6 

4.7 

12 

1.65 E-03 

8 

108.7 

8.1 

28.1 

12 

3.79 E -03 

9 

- 

- 

- 

- 

- 


Proposed ANN-Based Model 


(b) 


The proposed ANN is the one, among the ones simulated during the parametric analysis, 
exhibiting the lowest maximum error. In this case, that model was yielded by SA 9 and 
is characterized by the ANN feature methods {1,2, 6,4, 1, 7, 5, 1,3, 3, 3, 2, 3, 1,3} in Tabs. 
2-4. Aiming to allow implementation of this model by any user, all variables/equations 
required for (i) data preprocessing, (ii) ANN simulation, and (iii) data postprocessing, are 
presented in 3.7.1-3.7.3, respectively. The proposed ANN is a MLPN with 5 layers and a 
distribution of nodes/layer given by 3-4-4-4-1. Concerning connectivity, the network is 
fully-connected (across layer connections allowed), and the hidden and output transfer 
functions are all HyperbolicTangent and Identity, respectively.The network was trained using 
the LM algorithm (1500 epochs). After design, the network computing time concerning the 
presentation of a single example (including data pre/postprocessing) is 1.34x10"^ s - Fig. 7 
depicts a simplified scheme ofsome of network key features. Lastly, all relevant performance 
results concerning the proposed ANN are illustrated in3.7.4. 

FIGURE 7. Proposed 3-4-4-4-1 fully-connected MLPN - simplified scheme. 
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It is worth recalling that, in this manuscript, whenever a vector is added to a matrix, it means 
the former is to be added to all columns of the latter (this is valid in MATLAB). 

Input Data Preprocessing 

Forfuture use ofthe proposed ANN to simulate new data K . (3xP. vector) concerning P. 
patterns, the same data preprocessing (if any) performed before training must be applied 
to the input dataset. That preprocessing is defined by the methods used for ANN features 
2, 3 and 5 (respectively 2, 6 and 1 - see Tab. 2), which should be applied after all (eventual) 
qualitative variables in the input dataset are converted to numerical (using feature Vs 
method). Next, the necessary preprocessing to be applied to concerning features 2, 3 
and 5, is fully described. 

Dimensional Analysis and Dimensionality Reduction 

Since neither dimensional analysis (do.) nor dimensionality reduction (d.r.) were carried out, 

) after r rafter 

(26) 


Input Normalization 

After input normalization, the new input dataset {^i,sim}n is defined as function of 
the previously determined [yi,sim]±r^^> ^i^cl they have the same size, reading 

r y/ter _ 

where ".x" multiplies component / in the l.h.s vector by all components in row 

ANN-Based Analytical Model 

Once determined the preprocessed input dataset x matrix), the next step 

is to present it to the proposed ANN to obtain the predicted output dataset 
(1 X P^.^vector), which will be given in the same preprocessed format of the target dataset 
used in learning. In order to convert the predicted outputs to their "original format" (i.e., 
without any transformation due to normalization or dimensional analysis - the only 
transformation visible will be the (eventual) qualitative variables written in their numeric 
representation), some postprocessing is needed, as described in detail in 3.7.3. Next, the 
mathematical representation ofthe proposed ANN isgiven,sothatany usercan implement 
it to determine thus eliminating all rumors that ANNs are "black boxes". 

Y2 = (P2(K2{Y,.s>dT +b,] 

Y, - <p, + wij,+WIJ, + b ,) 

{X,.. ir = (^-5 KidT + ^2-sY2 + + Wl,Y, + b ,) 


0.0222 

0.0082 

0.0042 




(27) 
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where 


e +e 

(p^^(p{s)^s 


(29) 


5.92014666514410 -8.42287477536242 

-2.56058135230626 1.89089789773532 

-2.60190678017837 19.1276198184410 


6.12188206788266 16.2938196162533 
15.4065109454804 -16.7015058877463 
2.25430196289090 6.16075643046659 


b. 


3.12956390068096 

-12.1226968878702 

-23.0312462334084 

-5.37192667684186 


(30) 


"0.414738619705944 
= 1.18092219635388 
2.00787682675325 

■-2.63455533785155 
-3.40557389137861 
10.4283984813359 
-3.60024250992970 

-3.19599142132871 " 
0.578891626317490 
0.593077244491166 
-1.81765959400271 


1.32893635513501 

2.26733319392057 

-4.26930673182629 

1.42443597315850 

-0.378443916473808 

-7.59151007212646 

-22.6288310553954 


4.14403332041229 

-0.780958164772676 

-4.04682458597197 

-0.992635855262673 

-2.95088552349602 

8.65915039538676 

18.6518904980882 


-1.74907381715892" 

4.06317925046579 

2.31298898175833 

-1.44068011428239' 

-1.29760727903780 

-2.08761536504187 

-7.02100976668633 


(31) 


-10.1446983162026 

2.92157866250819 

-6.88857125074414 

"7.79122079835436 

7.72109214777274 

11.0635679581952 

-8.46335994709181 

'-3.94857119009544 

10.4006204162642 

11.4879411133621 

-16.4530465325905 


0.587033479365037 

1.06902882835772 

0.633560737751578 

0.939498926219798 

3.69400872747076 

-1.42326742691051 

-2.52477307911999 

0.773568727455620 

6.76962264847463 

-2.66775810317843 

0.239469158882274 


1.10810082763965 

2.02229814934862 

3.32191629973955 

0.994920549794745 

7.99494671486522 

1.20946817714587 

-2.68232307734636 

33.2244475474305 

15.3006148123197 

-25.6404239680795 

1.35702273276274 


-0.719586110812137 

-1.04151405318611 

-1.37693811199200 

-1.44588737852196' 

-0.330044349509708 

1.09672903767770 

-1.46464483126179 

-1.94259316527410 ' 
-0.493929406811997 
6.39085101284729 
-0.544559259202252 


b,= 


9.21240069987668 

0.165758464719451 

1.44718146443782 

-1.77437690033072 


(32) 
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2.41916145661974 

1.38165874874808 

2.23320041672227 



2.73625025655459 


4.09625222832201 

w - 

-0.914148067927541 

W - 

-4.44884606844407 

-2.83722897348479 

-1.58753838697364 


0.199079605256181_ 


8.92516458734120 _ 


12.3854353599432 

14.1529826905679 

-1.07394444351335 

12.9119477184451 


1.93363683792993 


( 33 ) 


Vectors and matrices presented in eqs. (30)-(33) can also be found in [54], aiming to ease 
their implementation by any interested reader. 

Output Data Postprocessing 

In order to transform the output dataset obtained by the proposed ANN, to 

its original format [Y^^J, i.e. without the effects of dimensional analysis and/or output 
normalization (possibly) taken in target dataset preprocessing prior training, the 
postprocessing addressed next must be performed. 


Non-normalized (just after dimensional analysis) and Original formats 

Once obtained following relations hold for its transformation to its non- 

normalized format after the dimensional analysis stage), and for latter's 

transformation to its original format Y^^.^ (with no influence of preprocessing) 

y ^ (y 

^5,sim y5,sim)^^ Y5,sim)^ (34) 


since no output normalization nor dimensional analysis were carried out. Moreover, 
since no negative output values are physically possible for the problem addressed 
herein, the ANN prediction should be defined as 

=max{F,,„,0} '^5) 


meaning that no structural damage exists whenever the output yielded by eq. (34) is 
negative. 

Performance Results 

Results yielded by the proposed ANN can be found either (i) online in [18], where the target 
and ANN output values are provided together with the corresponding input dataset, or 
(ii) in terms of performance variables defined in sub-section 3.4, as presented next in the 
form of several graphs: (iil) a regression plot (Fig. 8), where network target and output 
data are plotted, for each data point, as x- and y- coordinates, respectively - a measure 
of quality is given by the Pearson Correlation Coefficient (/?), as defined in eq. (1); (ii2) a 
performance plot (Fig. 9), where performance values are displayed for several datasets; 
and (iis) an error plot (Fig. 10), where values concern the maximum error and the % of 
errors greater than 3%, for all data. It's worth highlighting that all graphical results just 
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mentioned are based on target and output datasets computed in their original format, 
i.e. free of any transformations due to output normalization and/or dimensional analysis. 

FIGURE 8. Regression plot for the proposed ANN (see output variable in Fig. 1 (a)). 

Output Var 1: R=1 



Target 


Further Testing: Prediction of Experimental Results 


Aiming to test the proposed analytical model to the prediction of experimental results, 
three test results taken from [16] were considered, as shown in Tab. 7. Only tests I and III 
regard damaged members. The errors (smaller than 1 %) displayed in Tab. 7 attest once 
again the capability of the proposed ANN-based analytical model. 

FIGURE 9. Performance plot for the proposed ANN. 

0.0% 0.0% 0.0% 0.0% 



Training Validation Testing 
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TABLE 7. ANN performance in the prediction of 3 test results. 


TEST 

Freq. 1 
(Hz) 

Freq. 2 
(Hz) 

Freq. 3 
(Hz) 

Real Damage 
Location 
(m) 

ANN-based 
Damage Location 

(m) 

Error (%) 
ANN vs Real 

I 

40.142 

117.454 

221.441 

1.2407 

1.2367 

0.3 

II 

42.523 

118.771 

231.677 

No Damage 

-0.23 ^ 0 

0 

III 

39.590 

117.305 

221.143 

1.306 

1.315 

0.7 


CONCLUSIONS 

This paper primarily aimed to assess the potential of Artificial Neural Network (ANN) 
models in the prediction of damage localization in structural members, as function of 
their dynamic properties - the three first natural frequencies were used. Based on 64 
numerical examples from damaged (mostly) and undamaged steel channel beams, an 
ANN-based analytical model was proposed as a highly accurate and efficient damage 
localization estimator. The proposed model yielded maximum errors of 0.2 and 0.7 % 
concerning 64 numerical and 3 experimental data points, respectively. 

Since it was proved that the approach taken works well for structural members, authors' 
next step (in the very near future) is to apply similar procedures to entire bridge or 
building structures, this time based on much larger datasets in order to provide an 
analytical solution with high credibility concerning its generalization capability, i.e. its 
capacity of giving good results for a large amount of examples (i) within the ranges 
considered for the input variables, and (ii) not considered during ANN development. 
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